Author: Jesse
Clous Goeg. 350 Introduction to Data Acquisition Barry Bonds: A GIS Analysis to Homerun Production --------------------------------------------------------------------------------- Abstract The
major focus of the project was to take attributes leading to Barry Bonds
homerun production and display them graphically. This project geo-references
all of Barry Bonds homerun events for cross examination with the average
temperature of homerun event locations.
To perform attribute queries about homerun production and spatial
quires with climate data, a GIS model was created. The GIS model performs complex queries
about Barry Bonds homerun attribute information and spatial relationships to
average temperature at homerun locations.
Barry likes to hit most of his homeruns in --------------------------------------------------------------------------------- Introduction In
this report, GIS will be utilized to analyze and calculate attributes that
relate to the total homerun production of Berry Bonds. Home run production will also be cross
examined with average climate of homerun locations. Berry Bonds is the
lifetime homerun leader for Major League Baseball with 762 lifetime career
homeruns (cbssports). Goals:
Objectives:
This
project will study attributes leading to Barry Bonds all time homerun record
and will answer the following questions: Where did Barry Bonds hit most of his
homeruns? Where did he hit more homeruns off of
left handed pitchers vs. right? What month did he hit the most
homeruns and where? What climate does he hit the most
homeruns? To
answer these questions a complete homerun log of Barry Bonds will be needed along
with field locations and average climate information. Once these are imputed into GIS analysis
can be performed to answer the objective questions. To answer the objective questions
analytical and spatial analysis must be performed though the use of GIS. Questions such as: what pitcher did Barry
Bonds hit the most home runs off of, what day did Berry Bonds hit homerun 500
and where, and how many homeruns did Barry Bonds hit off of the Angels in
1999 off of left handed pitching in June can be answered by attribute
queries. A
GIS model will be created to perform attribute quarrying, spatial quarrying,
quantitative analysis, and then used to map spatial quarry results. Attribute tables and new layers will be
created to analyze performed quarries.
Functions such as summarize and statistics will be used to calculate
nominal homerun data. Depending
on the availability of the data and the scope of analysis to be performed
this project will take approximately 40 hours to be spread out through the
academic semester. This is a student
project for Geog 334, Introduction of Software Applications, and is intended
to be presented to the class to display the use of introductory GIS
application functions involved in the project. Results of the project is also intended to
be viewed by anyone interested in Barry Bonds and wants to know statistical
information about his life time homerun record.. Results of the project will be presented in
an in class PowerPoint oral report, including maps and analytical results
about homerun production. --------------------------------------------------------------------------------- Background Professional baseball today is increasingly using GIS
applications to perform functions in baseball that were currently unavailable
prior to GIS. The Oakland Athletics
created a raster model to measure the impact opposing players and teams have
on them and analyze their players, as well as prospective player’s impact to
their organization (Lewis). Construction of the |
|||||||||||||||||||||||||||||||||||||||||||||||
The Oakland Athletics are always in the hunt for playoffs
even though they have one of the smallest salaries in baseball, why? The A’s have incorporated GIS in their
franchise to manage layers and look for new talents. Michael Lewis, author of Money Ball details
how the Athletics use technology to their advantage. They use GIS to calculate current and
prospective players by spatially rating players
plays through raster graphics of a baseball field so they acquire talent not
by price and statistics but what attributes they are lacking or have excess
of and make personal decision based on that.
A good overview of the GIS application by the |
|
||||||||||||||||||||||||||||||||||||||||||||||
New Prior
to the development of Safeco field in |
|
||||||||||||||||||||||||||||||||||||||||||||||
Sports Illustrated: Great American Sports Atlas What
started out as 50th anniversary campaign for Sports Illustrated (SI) became
an exclusive 32 page article highlighting GPS resulting in the SI Sports
Atlas. SI wanted to run a weekly
feature for its 50th year highlighting the best athlete from each
state and describing where the player came from. Then it grew into all players of all
sports, which in turn became the basis for the GIS model and the beginning of
the Sports Atlas. This
article is what appears to be the beginning of using complex GIS in
sports. It can be used by scouts to
see what regions to target for particular kinds of athlete or by the general
fans for entertainment. Team owners
are using GIS to view the fan bases and market the team. After reading this article I am positive
that there is a huge future with GIS in sports. |
|
||||||||||||||||||||||||||||||||||||||||||||||
Develop a
Suitable Model A
model was developed prior to data acquisition to define the methodology to
develop the GIS. Using model builder
in ArcMap a detailed model including GIS functions
displays the structure to develop the GIS model. --------------------------------------------------------------------------------- Data
Collection Data
needed for the GIS model include a Barry Bonds homerun log with locations of
homeruns, homerun number, team against, team for, pitcher, pitcher hand,
date, and. X, Y coordinates of the baseball fields will be needed to create
field location points. Average climate
information will also be needed to perform spatial analysis against
homeruns. Barry
Bonds homerun data was found at cbssports.com and was available in .txt
format. From the cbssports.com website
the homerun log was cut and pasted directly nto a
Microsoft Excel spreadsheet. Attributes of this table called New_Homerun_log include Homerun, Date, Pitcher, Team
Against, Team for, and Field. The attribute headings provided a few major
problems when the raw data was pasted into Excel: the date column was in dd/mm/yyyy format and needed to be separated for
attribute querying and there were spaces in column headings. Also, there was no x,y attributes for field locations. The date column
was then cut and pasted into Microsoft Word and the find and replace function
was performed to replace all back slashes with commas. Next the newly created Word document was
opened and saved in Notepad so data could be imported back into Excel in
comma delimited format. The
column headings were changed to remove all spaces and x,y attributes were added from the field location
table discussed next. Once the date data was imported back into Excel a
complete homerun log with locations of homeruns, homerun number, locations of
homeruns, team against, team for, pitcher hand, date, and pitcher was
created. Homerun
Log Next
stadium locations needed to be defined.
A .kml file was found at 252pair.com and
provided an xml format of all current baseball stadiums in Major League
Baseball (Pair.com). Google Earth was used to produce the .kml files and was available in .html format. The data was cut and pasted into Notepad
and then imported into Excel separating column headings at line breaks. The spread sheet was cleaned up to read
just the stadium name and x, y coordinates so that it could be imported into ArcMap giving the GIS all field locations (App. 4) The x,y data for each field location
was added to the homerun log so that individual homeruns could be spatially
referenced. |
|||||||||||||||||||||||||||||||||||||||||||||||
.klm File |
X, Y
Table |
||||||||||||||||||||||||||||||||||||||||||||||
Field
Locations Average
temperature data was found at the TEMP0313 The states shape file named states.shp
was derived from Environmental Systems Research Institute, Inc. (ESRI) and obtained via
student folder. The projection used in the GIS model is
Equidistant_Conic and all data sets will either be
defined or redefined to fit that projection.
Imported Shapefiles and Data
Once all the data was collected and
refined the data was imported into ArcMap. The x,y
stadium location table was imported into ArcMap and
the function display x,y coordinates was
performed. The data was exported as a
.shp file and projected in North_American_Equidistant_Conic. The same process was done for the homerunlog.table. The states.shp
file was added to the GIS along with the TEMP0131.shp file. The TEMP0313.shp file was projected to North_American_Equidistant_Conic matching the states.shp projection.
All imported data is projected to the North_American_Equidistant_Conic.
Now layers can be created to perform spatial and analytical analysis. New Shapefiles and Data
The climate.shp
layer contained one shape file containing nine different attribute
temperature values in the attribute table in the DEG_F column.
There are nine different temperature
values in the climate attribute table. To perform inside spatial quarries
with homerun locations, the nine different temperature values were selected
individually "DEG_F" = 'B 32.0 - 40.0' and nine new layers were
created; one for each temperature zone.
Next the spatial quarry was performed by selecting all homeruns that
are with in each of the newly crated temperature layers and 9 new layers were
created. New
Climate Layers >70.0,
65.1–70.0, 60.1-65.0, 55.1-60.0, 50.1-55.0, 45.1-50.0, 40.1-45.0, 32.0-40.0, <32.0 New Climate
Layers New
Homerun Layers 4_HR,
5_HR, 6_HR, 7_HR, 8_HR, 9_HR A
homerun layer was not need for all of the temperature zones because not all
temperature zones contained homeruns. Statistics
for the attribute ‘field’ in the all homerun table were performed to
summarize homerun totals per stadium.
This information was added to the Ballparks.shp
attribute table. An
attribute quarry was performed on the all_homerun
layer “hand” = ‘L’ to select homeruns off left handed pitchers and “hand” =
‘R’ for homeruns off of left handed pitchers.
A layer was created for each of the selected attributes: LH_HR and
RH_HR. New
Homerun Layers LH_HR,
RH_HR --------------------------------------------------------------------------------- Analysis Where did Barry Bonds hit most of his
homeruns? Homeruns Where did he hit more homeruns off of
left handed pitchers vs. right? As
with the total homeruns Barry hit most left handed and right handed homeruns
with the San Francisco Giants: 212 RH, 88RH.
Unlike total homeruns, Barry did not hit the second most homeruns in Homeruns
by Pitcher Hand What month did he hit the most
homeruns and where? Barry
hit the most home runs in the month of August with 148. The most homeruns runs hit in August in What climate does he hit the most
homeruns? Barry
hit 326 Homeruns in temperature zones 55.1-60 degrees. Homer
Runs vs. Climate Besides
specific examples answering project objective questions the GIS model created
is capable of performing complex and highly specific attribute queries. Below is a specific question that can be
answered through a combination of spatial and attribute queries. How many homeruns did Barry hit in a
climate range of 45.1-55.0 degrees against left handed pitchers in the month
of August in 1999? Select
by Attribute for All_HR "Year"
= 1999 AND "Month" = 8 AND "Hand" = 'L' Select
by Location from the selected features in the All_HR
layer that are within the features of the 45.1-50.0 layer. On
August, 20 1999 Barry Bonds hit homerun number 431 off left handed pitcher
Rafael in --------------------------------------------------------------------------------- Conclusion The
GIS model created in this report answers all objective questions and allows
for specific attribute searches for Barry Bonds homerun production. Although the model is successful, there is
much room for improvement. There was a
challenge creating graphics with this model, most of the power in the GIS was
based on attribute relationships, not spatial. This does however provide the grounds for
an improved GIS model to apply to baseball.
I would like to pursue this project in the future by applying raster
graphics that detail individual plays.
If I am able to locate play logs and charts I should be able to make a
baseball game come to life in GIS. I would
be able to show illustrated maps detailing spatial occurrence in
baseball. Stay tuned for more GIS in
baseball! --------------------------------------------------------------------------------- References 252.pair.com.
Web. 11 Oct 2009.
<http://www252.pair.com/comdog/google_earth/major_league_baseball_stadiums.kml>. "Bonds Carrer Homerun Log." CBSSportsMLB.
CBSSPORTS, Web. 10 Oct 2009.
<http://www.cbssports.com/mlb/bondstracker/bondslog>. Lewis, Michael. Moneyball:
The Art of Winning an Unfair Game.
New Your: Norton and Company, Inc., 2003. Print. Moore, Patrick. "Building
a Baseball Stadium Using GIS."integralgis.com.
05/14/2002. Integralgis Inc., Web. 7 Dec 2009.
<http://www.integralgis.com/pdf/stadium.pdf>. "ncdc.noaa.gov."
4/26/2006. National Climatic |
|||||||||||||||||||||||||||||||||||||||||||||||